Skip to content

feat: add to concat different data types error message the data types#7166

Merged
tustvold merged 5 commits intoapache:mainfrom
rluvaton:add-to-concat-what-is-incompatible
Mar 8, 2025
Merged

feat: add to concat different data types error message the data types#7166
tustvold merged 5 commits intoapache:mainfrom
rluvaton:add-to-concat-what-is-incompatible

Conversation

@rluvaton
Copy link
Copy Markdown
Member

Which issue does this PR close?

N/A

Rationale for this change

Better debugging experience

What changes are included in this PR?

Only added the unique data types in the concat message and updated the tests

Are there any user-facing changes?

yes, they will see more helpful error message

@github-actions github-actions Bot added the arrow Changes to the arrow crate label Feb 20, 2025
@tustvold
Copy link
Copy Markdown
Contributor

I wonder if we need to incorporate some sort of cardinality limit here, e.g. similar to what we do when printing long arrays. I think this could potentially lead to long error messages, which in turn can lead to application hangs that are hard to diagnose.

WDYT?

Comment thread arrow-select/src/concat.rs Outdated
.map(|dt| format!("{dt}"))
.collect::<Vec<_>>();

// Only sort in tests to make the error message is deterministic
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could just use a BTreeSet? It will be slightly slower, but having non-deterministic error messages I think would be surprising for people.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I kept the HashSet but just for tracking unique values, and now the error message have the data type in the order of the input which is deterministic and better so people can get a sense about where the input exists

rluvaton added 2 commits March 6, 2025 13:59
and also change the data type order to appear in the same order as the arrays for easier debugging
@rluvaton
Copy link
Copy Markdown
Member Author

rluvaton commented Mar 6, 2025

I wonder if we need to incorporate some sort of cardinality limit here, e.g. similar to what we do when printing long arrays. I think this could potentially lead to long error messages, which in turn can lead to application hangs that are hard to diagnose.

WDYT?

I've added a limit of 10

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @rluvaton and @tustvold -- this looks like a very nice and useful improvement to me

@tustvold tustvold merged commit f5138fc into apache:main Mar 8, 2025
@rluvaton rluvaton deleted the add-to-concat-what-is-incompatible branch March 9, 2025 11:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants